Financial assets exhibit hierarchical structure. They belong to classes such as stocks or bonds, are country specific, belong to sectors and industries, and load on other common risk factors.

Stocks may have different conditional expectations given their sector, for example. To build performant models, these differences need to be modeled.

In the following, a panel data set of asset returns is simulated. By construction, the unconditional expectation (i.e., the expectation ignoring the dependence on sectors) is equal to zero. Hence, a pooled model is unable to fit the data.

There are different ways to fit the model:

  • fit pooled data (here: impossible to achieve a good fit.)
  • fit data independently per sector (inefficient)
  • encode sector as feature and fit pooled data
  • use hierarchical model

Why is there a need for hierarchical models when neural networks are very good at learning asset encodings internally given relevant features? Consider what happens if each asset has a different and independent data generating function. Then you would need an one-hot encoding matrix with dimensions $np \times p$. If $n$ is not very large relative to $p$, the asset encoding might be hard to learn. A Bayesian approach allows each asset specific neural network to deviate from the group model if and only if there is sufficient evidence to conclude that it follows a different data generating process.

In a Bayesian framework we could also model the multiple levels of the hierarchical structure explicitly. The overall net may serve as a prior for sector nets, which, in turn, may serve as priors for asset specific nets.

def mse(y_hat, y):
    return ((y_hat - y)**2).mean()
    
def get_ir(log_rets, Y_hat, inv_cov_tensor):

    w = torch.einsum('np, npj -> nj', Y_hat, inv_cov_tensor).detach().numpy()
    opt_weights = w / np.abs(w).sum(axis=1)[:, None]
    portfolio_ret = np.einsum('ij, ij->i', opt_weights, log_rets)
    ir = np.mean(portfolio_ret) / np.std(portfolio_ret) * np.sqrt(252)
    return ir, portfolio_ret

Data generation

The following snipped generates a panel data set with sector dependent nonlinear conditional expectation. You may want to open it in Colab and play with different valuess for n_stocks, n_days, n_ind.

def add_daily_trend_to_log_price(log_price, feature_list, beta_list):
    """
    This function adds a daily trend to high-frequency simulations.
    """
    for beta, feature in zip(beta_list, feature_list):
        signal = pd.DataFrame(log_price).copy()
        signal.iloc[:, :] = np.nan

        for i in set(signal.index.date):
            for c in signal.columns:
                signal.loc[signal.index.date == i, c] =\
                   feature[feature.index == i][c].values * beta / n_periods
        signal_cumsum = signal.cumsum()
        log_price += signal_cumsum


# Hyperparams
n_stocks = 50
n_days = 100
n_ind = n_stocks//2
pct_train = 0.5


factor_loadings = np.linspace(1, 3, n_stocks)[:, None]
npi = int(n_stocks/n_ind)
industry_loadings = np.zeros((n_stocks, n_ind))
for i in range(n_ind):
    industry_loadings[i*npi:(i+1)*npi, i] = np.ones(npi)[:]

# beta
feature_beta = industry_loadings  @ np.linspace(-1,1, n_ind)
# np.random.shuffle(feature_beta)

# Use only intraday returns for cov estimation
d = 1

# High-frequency prices are irregularly sampled
# proportion of sampled prices
liq = 1#0.8

# Prices are contaminated with micro structure noise
# 10ct per $100 ms noise
gamma = 0#0.1/100

# mins per day
n_periods = 6.5*60

# GARCH(1,1) spec
sigma_sq_0 = 0.1**2/250/n_periods
garch_alpha = 0#0.019
garch_beta = 0#0.98
omega = sigma_sq_0*(1-garch_alpha-garch_beta)

# preaveraging parameters
k = int(0.5*(n_periods)**(0.5))

pairwise = True

# NERIVE parameters
L = 4
stp = hd._get_partitions(L)

# condition number targeting parameters
max_cond_target = 800
max_iter = 100
step = 0.05

estimators = {
                #  'realized_cov': partial(hf.mrc, pairwise=False, theta=0),
                  # 'msrc': partial(hf.msrc, pairwise=pairwise, M=M, N=N),
                  'mrc': partial(hf.mrc, pairwise=pairwise, theta=None, k=k),
                  # 'hy': partial(hf.hayashi_yoshida, theta=None, k=k),
                  # 'krvm': partial(hf.krvm, H=H, pairwise=pairwise,
                  #  kernel=hf.parzen_kernel),
                  # 'epic': partial(hf.ensemble, var_weights=np.ones(4)/4,
                  # cov_weights=np.ones(4)/4),
                 }


u = sim.Universe(0,
          [sigma_sq_0, 0, garch_alpha, garch_beta, omega],
          [sigma_sq_0, 0, garch_alpha, garch_beta, omega],
          [sigma_sq_0, 0, garch_alpha, garch_beta, omega],
          factor_loadings,
          industry_loadings,
          liq,
          gamma,
          'm',
          )
u.simulate(n_days)
price = u.price

feature_daily = np.random.normal(0, 0.1, n_days*n_stocks).reshape(n_days, n_stocks)
feature2_daily = np.random.normal(0, 0.1, n_days*n_stocks).reshape(n_days, n_stocks)

# add a daily predictable mean to high-frequency data
signal = feature_beta * feature_daily * feature2_daily

signal_df = pd.DataFrame(signal,
                          index=pd.Series(list(set(price.index.date))).sort_values(),
                          columns=price.columns)
log_price = np.log(price)
add_daily_trend_to_log_price(log_price, [signal_df], [1.])
price = np.exp(log_price)

price_daily_close = price.resample('1d').last()
price_daily_open = price.resample('1d').first()
log_rets_daily = np.log(price_daily_close) - np.log(price_daily_open)

X = torch.tensor([feature_daily, feature2_daily]).permute(1, 0, 2).float()
Y = torch.tensor(log_rets_daily.values).float()

n_time_steps_train = int(n_days*pct_train)
n_time_steps_valid = n_days - n_time_steps_train

covs_nerive = {key: [] for key in estimators.keys()}

# Compute the integrated covariance matrices.
for i in log_rets_daily.index.date[:n_time_steps_train]:
    p = price[(price.index.date < i+timedelta(days=d))&(price.index.date > i-timedelta(days=d))]
    l = [hf.get_cumu_demeaned_resid(p.iloc[:, j]) for j in range(price.shape[1])]
    for name, estimator in estimators.items():

            cov = estimator(l)
            # covs[name].append(cov)

            if max_cond_target is None:
                covs_nerive[name].append(hd.nerive(l, estimator=estimator, stp=stp))
            else:
                covs_nerive[name].append(hd.linear_shrink_target(
                    hd.nerive(l, estimator=estimator, stp=stp),
                    max_cond_target, step, max_iter))

# split into train and valid set
Y_train = Y[:n_time_steps_train, :]
X_train = X[:n_time_steps_train, :, :]

Y_valid = Y[n_time_steps_train:, :]
X_valid = X[n_time_steps_train:, :, :]

# normalize
train_mean = X_train.mean(keepdim=True, dim=(0, 2))
train_std = X_train.std(keepdim=True, dim=(0, 2))
X_train = (X_train - train_mean) / train_std
X_valid = (X_valid - train_mean) / train_std
/usr/local/lib/python3.6/dist-packages/hfhd/hf.py:1204: NumbaPerformanceWarning: np.dot() is faster on contiguous arrays, called on (array(float64, 1d, C), array(float64, 1d, A))
  cov[i, i] = _mrc(data, theta, g, bias_correction, k)[0, 0]
/usr/local/lib/python3.6/dist-packages/numba/np/ufunc/parallel.py:363: NumbaWarning: The TBB threading layer requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
  warnings.warn(problem)
inv_cov_tensor = torch.tensor([np.linalg.inv(c) for c in u.cond_cov()])[n_time_steps_train:]

The following covariance matrix and dendrogram show the hierarchical structure of asset returns.

cov_avg = np.array(covs_nerive['mrc'])[:n_time_steps_train].mean(0)
corr = hd.to_corr(cov_avg)
sns.clustermap(corr)
<seaborn.matrix.ClusterGrid at 0x7fba4f204780>
from scipy.cluster.hierarchy import ward, dendrogram

linkage_matrix = ward(corr) 

fig, ax = plt.subplots(figsize=(10, 10))
ax = dendrogram(linkage_matrix, orientation="right");

plt.tick_params(
    axis= 'x',
    which='both',
    bottom='off',
    top='off',
    labelbottom='off'
               )

By construction the returns are not predictable without conditioning on the sector. The figure below illustrates this.

n = n_time_steps_train
p = n_stocks

x1 = X_train[:,0,:]
x2 = X_train[:,1,:]
y = Y_train


fig = plt.figure()
ax = plt.axes(projection='3d')
ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$')
ax.set_zlabel('y');
ax.scatter(x1, x2, y, c=y, cmap='plasma', linewidth=0.0);
plt.title("Log-returns without conditioning on sectors.")
plt.show()
x1_train = np.empty((n_ind, X_train.shape[0], npi))
x2_train = np.empty_like(x1_train)
y_train = np.empty_like(x1_train)

x1_valid = np.empty((n_ind, X_valid.shape[0], npi))
x2_valid = np.empty_like(x1_valid)
y_valid = np.empty_like(x1_valid)

for i in range(n_ind):
  x1_train[i,:,:] = X_train[:,0,i*npi:(i+1)*npi]
  x2_train[i,:,:] = X_train[:,1,i*npi:(i+1)*npi]
  y_train[i,:,:] = Y_train[:, i*npi:(i+1)*npi]


  x1_valid[i,:,:] = X_valid[:,0,i*npi:(i+1)*npi]
  x2_valid[i,:,:] = X_valid[:,1,i*npi:(i+1)*npi]
  y_valid[i,:,:] = Y_valid[:, i*npi:(i+1)*npi]

The following figure shows the train set data points per sector. There are not enough observations to fit a seperate neural network to each sector without informative priors.

fig = plt.figure(figsize=(13,13))
fig.suptitle('Train set per sector.', fontsize=20)

for i in range(n_ind):
  ax = fig.add_subplot(np.ceil(np.sqrt(n_ind)), np.ceil(np.sqrt(n_ind)),
                       i+1, projection='3d')
  ax.set_xlabel('$x_1$', fontsize=8)
  ax.set_ylabel('$x_2$', fontsize=8)
  ax.set_zlabel('y', fontsize=8);
  ax.tick_params(labelsize=8);
  ax.set_xlim(-2, 2)
  ax.set_ylim(-2, 2)
  ax.set_zlim(-0.01, 0.01)
  ax.scatter(x1_train[i,:,:], x2_train[i,:,:], y_train[i,:,:],
             c=y_train[i,:,:], cmap='viridis', linewidth=0.0);

plt.show()

The next figure shows the test set data points per sector.

fig = plt.figure(figsize=(13,13))
fig.suptitle('Test set per sector.', fontsize=20)

for i in range(n_ind):
  ax = fig.add_subplot(np.ceil(np.sqrt(n_ind)), np.ceil(np.sqrt(n_ind)),
                       i+1, projection='3d')
  ax.set_xlabel('$x_1$', fontsize=8)
  ax.set_ylabel('$x_2$', fontsize=8)
  ax.set_zlabel('y', fontsize=8);
  ax.tick_params(labelsize=8);
  ax.set_xlim(-2, 2)
  ax.set_ylim(-2, 2)
  ax.set_zlim(-0.01, 0.01)
  ax.scatter(x1_valid[i,:,:], x2_valid[i,:,:], y_valid[i,:,:],
             c=y_valid[i,:,:], cmap='viridis', linewidth=0.0);

plt.show()

Hierarchical model

The following code defines a hierarchical Bayesian neural network with two hidden layers.

# This idea comes from:
# https://twiecki.io/blog/2018/08/13/hierarchical_bayesian_neural_network/.
# Non-centered specification of hierarchical model:
# see https://twiecki.github.io/blog/2017/02/08/bayesian-hierchical-non-centered/
# Why?:
# https://www.youtube.com/watch?v=gSd1msFFZTw

X_train = np.stack((x1_train.reshape(n_ind, npi*n_time_steps_train),
                    x2_train.reshape(n_ind, npi*n_time_steps_train)), 2)
Y_train = y_train.reshape(n_ind, npi*n_time_steps_train)

ann_input = theano.shared(X_train)
ann_output = theano.shared(Y_train)

n_hidden = [10, 10]
n_data = X_train.shape[2]
n_grps = n_ind

# Initialize random weights between each layer
init_1 = np.random.randn(n_data, n_hidden[0]).astype(theano.config.floatX)
init_2 = np.random.randn(n_hidden[0], n_hidden[1]).astype(theano.config.floatX)
init_out = np.random.randn(n_hidden[1]).astype(theano.config.floatX)
    
with pm.Model() as neural_network:
    # Group mean distribution for input to hidden layer
    weights_in_1_grp = pm.Normal('w_in_1_grp', 0, sd=1, 
                              shape=(n_data, n_hidden[0]), 
                              testval=init_1)
    # Group standard-deviation
    weights_in_1_grp_sd = pm.HalfNormal('w_in_1_grp_sd', sd=1.)
    
    # Group mean distribution for weights from 1st to 2nd layer
    weights_1_2_grp = pm.Normal('w_1_2_grp', 0, sd=1, 
                            shape=(n_hidden[0], n_hidden[1]), 
                            testval=init_2)
    weights_1_2_grp_sd = pm.HalfNormal('w_1_2_grp_sd', sd=1.)
    
    # Group mean distribution from hidden layer to output
    weights_2_out_grp = pm.Normal('w_2_out_grp', 0, sd=1, 
                              shape=(n_hidden[1],), 
                              testval=init_out)
    weights_2_out_grp_sd = pm.HalfNormal('w_2_out_grp_sd', sd=1.)

    # Separate weights for each different model, just add a 3rd dimension
    # of weights
    weights_in_1_raw = pm.Normal('w_in_1', 
                                  shape=(n_grps, n_data, n_hidden[0]))
    
    weights_in_1 = weights_in_1_raw * weights_in_1_grp_sd + weights_in_1_grp
    
    weights_1_2_raw = pm.Normal('w_1_2', 
                                shape=(n_grps, n_hidden[0], n_hidden[1]))
    weights_1_2 = weights_1_2_raw * weights_1_2_grp_sd + weights_1_2_grp
    
    weights_2_out_raw = pm.Normal('w_2_out', 
                                  shape=(n_grps, n_hidden[1]))
    
    weights_2_out = weights_2_out_raw * weights_2_out_grp_sd + weights_2_out_grp

    
    
    # Build neural-network using relu activation function
    act_1 = tt.nnet.relu(tt.batched_dot(ann_input, 
                         weights_in_1))
    act_2 = tt.nnet.relu(tt.batched_dot(act_1, 
                         weights_1_2))
    
    # linear output layer
    intercept = pm.Normal('intercept', mu=0, sd=10)
    act_out = tt.batched_dot(act_2, weights_2_out) + intercept
        
    sigma = pm.HalfNormal('sigma', sd=10)
    out = pm.Normal('out', 
                    act_out,
                    sigma,
                    observed=ann_output)
WARNING (theano.tensor.blas): We did not find a dynamic library in the library_dir of the library we use for blas. If you use ATLAS, make sure to compile it with dynamics library.
with neural_network:
    trace = pm.sample(init='advi+adapt_diag',  n_init=200000, tune=50, chains=1, 
                     nuts_kwargs={'target_accept': 0.9},)
Auto-assigning NUTS sampler...
Initializing NUTS using advi+adapt_diag...
Average Loss = -6,472.8: 100%|██████████| 200000/200000 [07:55<00:00, 420.34it/s]
Finished [100%]: Average Loss = -6,472.8
Sequential sampling (1 chains in 1 job)
NUTS: [sigma, intercept, w_2_out, w_1_2, w_in_1, w_2_out_grp_sd, w_2_out_grp, w_1_2_grp_sd, w_1_2_grp, w_in_1_grp_sd, w_in_1_grp]
100%|██████████| 550/550 [09:03<00:00,  1.01it/s]
There were 9 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.4623924185795791, but should be close to 0.9. Try to increase the number of tuning steps.
Only one chain was sampled, this makes it impossible to run some convergence checks

Train set performance

ppc = pm.sample_posterior_predictive(trace, model=neural_network, samples=1000)
y_pred = ppc['out'].mean(0)
100%|██████████| 1000/1000 [00:18<00:00, 53.97it/s]
mse(y_pred, Y_train)
0.00024471051017955266

Test set performance

X_valid = np.stack((x1_valid.reshape(n_ind, npi*n_time_steps_valid),
                    x2_valid.reshape(n_ind, npi*n_time_steps_valid)), 2)
Y_valid = y_valid.reshape(n_ind, npi*n_time_steps_valid)

ann_input.set_value(X_valid)
ann_output.set_value(Y_valid)
ppc = pm.sample_posterior_predictive(trace, model=neural_network, samples=1000)
y_hat_valid = ppc['out'].mean(0)
100%|██████████| 1000/1000 [00:15<00:00, 66.46it/s]
mse(y_hat_valid, Y_valid)    
0.00035460431075137717

The figure below shows the predictions (green) and targets (blue) per sector for the hierarchical model.

fig = plt.figure(figsize=(13,13))
fig.suptitle('Predictions per sector.', fontsize=20)

for i in range(n_ind):
  ax = fig.add_subplot(np.ceil(np.sqrt(n_ind)), np.ceil(np.sqrt(n_ind)),
                       i+1, projection='3d')
  ax.set_xlabel('$x_1$', fontsize=8)
  ax.set_ylabel('$x_2$', fontsize=8)
  ax.set_zlabel('y', fontsize=8);
  ax.tick_params(labelsize=8);
  ax.set_xlim(-2, 2)
  ax.set_ylim(-2, 2)
  ax.set_zlim(-0.01, 0.01)
  ax.scatter(x1_valid[i,:,:], x2_valid[i,:,:], y_hat_valid[i,:],
             #c=y_pred[i,:],
             #cmap='Greens',
             c='green',
             linewidth=0.0);
  ax.scatter(x1_valid[i,:,:], x2_valid[i,:,:], y_valid[i,:,:],
             #c=y_valid[i,:,:],
             #cmap='gray',
             c='blue',
             linewidth=0.0);


plt.show()
ir, rets = get_ir(log_rets=torch.tensor(Y_valid.reshape(n_stocks, n_time_steps_valid)).T,
       Y_hat=torch.tensor(y_hat_valid.reshape(n_stocks, n_time_steps_valid)).T,
       inv_cov_tensor=inv_cov_tensor)
print('Test set information ratio:', ir)
plt.plot(np.cumsum(rets))
plt.title('Test set cumulative portfolio log-returns')
plt.xlabel('days')
plt.ylabel('cumulative portfolio log-returns')
Test set information ratio: 23.54194763228627
Text(0, 0.5, 'cumulative portfolio log-returns')

Industry one-hot encoding as feature

X_train = np.stack((x1_train.reshape(n_ind*npi*n_time_steps_train),
                    x2_train.reshape(n_ind*npi*n_time_steps_train)), 1)

# Comment the following line fit pooled model without industry feature
X_train = np.column_stack((X_train,
                           np.repeat(industry_loadings, n_time_steps_train, 0)))

Y_train = y_train.reshape(n_ind*npi*n_time_steps_train)


ann_input = theano.shared(X_train)
ann_output = theano.shared(Y_train)

# n_hidden = [10, 5]
n_data = X_train.shape[1]


# Initialize random weights between each layer
init_1 = np.random.randn(n_data, n_hidden[0]).astype(theano.config.floatX)
init_2 = np.random.randn(n_hidden[0], n_hidden[1]).astype(theano.config.floatX)
init_out = np.random.randn(n_hidden[1]).astype(theano.config.floatX)
    
with pm.Model() as neural_network:
    weights_in_1 = pm.Normal('w_in_1_grp', 0, sd=1, 
                              shape=(n_data, n_hidden[0]), 
                              testval=init_1)

    weights_1_2 = pm.Normal('w_1_2_grp', 0, sd=1, 
                            shape=(n_hidden[0], n_hidden[1]), 
                            testval=init_2)

    weights_2_out = pm.Normal('w_2_out_grp', 0, sd=1, 
                              shape=(n_hidden[1],), 
                              testval=init_out)

    # Build neural-network using relu activation function
    act_1 = tt.nnet.relu(tt.dot(ann_input, 
                         weights_in_1))
    act_2 = tt.nnet.relu(tt.dot(act_1, 
                         weights_1_2))
    
    # linear output layer
    intercept = pm.Normal('intercept', mu=0, sd=10)
    act_out = tt.dot(act_2, weights_2_out) + intercept
        
    sigma = pm.HalfNormal('sigma', sd=10)
    out = pm.Normal('out', 
                    act_out,
                    sigma,
                    observed=ann_output)
with neural_network:
    trace = pm.sample(
                     init='advi+adapt_diag', tune=50, chains=1,  n_init=200000,
                     nuts_kwargs={'target_accept': 0.9},)
Auto-assigning NUTS sampler...
Initializing NUTS using advi+adapt_diag...
WARNING (theano.tensor.blas): We did not find a dynamic library in the library_dir of the library we use for blas. If you use ATLAS, make sure to compile it with dynamics library.
Average Loss = -5,807.8:  21%|██▏       | 42666/200000 [02:02<07:32, 348.00it/s]
Convergence achieved at 42700
Interrupted at 42,699 [21%]: Average Loss = -824.93
Sequential sampling (1 chains in 1 job)
NUTS: [sigma, intercept, w_2_out_grp, w_1_2_grp, w_in_1_grp]
100%|██████████| 550/550 [11:10<00:00,  1.22s/it]
There were 77 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.5934816874673563, but should be close to 0.9. Try to increase the number of tuning steps.
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
Only one chain was sampled, this makes it impossible to run some convergence checks

Train set performance

ppc = pm.sample_posterior_predictive(trace, model=neural_network, samples=1000)
y_hat = ppc['out'].mean(0).reshape(n_ind, npi*n_time_steps_train)
mse(y_hat, Y_train.reshape(n_ind, npi*n_time_steps_train))    
100%|██████████| 1000/1000 [00:24<00:00, 40.70it/s]
0.00023132680868411578

Test set performance

X_valid = np.stack((x1_valid.reshape(n_ind*npi*n_time_steps_valid),
                    x2_valid.reshape(n_ind*npi*n_time_steps_valid)), 1)

X_valid = np.column_stack((X_valid, np.repeat(industry_loadings, n_time_steps_valid, 0)))
Y_valid = y_valid.reshape(n_ind*npi*n_time_steps_valid)
ann_input.set_value(X_valid)
ann_output.set_value(Y_valid)
ppc = pm.sample_posterior_predictive(trace, model=neural_network, samples=5000)
100%|██████████| 5000/5000 [01:48<00:00, 46.05it/s]
y_hat_valid = ppc['out'].mean(0).reshape(n_ind, npi*n_time_steps_valid)
mse(y_hat_valid, Y_valid.reshape(n_ind, npi*n_time_steps_valid))    
0.00036157395132278554

The figure below shows the predictions (green) and targets (blue) per sector for the standard model.

fig = plt.figure(figsize=(13,13))
fig.suptitle('Predictions per sector.', fontsize=20)

for i in range(n_ind):
  ax = fig.add_subplot(np.ceil(np.sqrt(n_ind)), np.ceil(np.sqrt(n_ind)),
                       i+1, projection='3d')
  ax.set_xlabel('$x_1$', fontsize=8)
  ax.set_ylabel('$x_2$', fontsize=8)
  ax.set_zlabel('y', fontsize=8);
  ax.tick_params(labelsize=8);
  ax.set_xlim(-2, 2)
  ax.set_ylim(-2, 2)
  ax.set_zlim(-0.01, 0.01)
  ax.scatter(x1_valid[i,:,:], x2_valid[i,:,:], y_hat_valid[i,:],
             #c=y_pred[i,:],
             #cmap='Greens',
             c='green',
             linewidth=0.0);
  ax.scatter(x1_valid[i,:,:], x2_valid[i,:,:], y_valid[i,:,:],
             #c=y_valid[i,:,:],
             #cmap='gray',
             c='blue',
             linewidth=0.0);


plt.show()
ir, rets = get_ir(log_rets=torch.tensor(Y_valid.reshape(n_stocks, n_time_steps_valid)).T,
       Y_hat=torch.tensor(y_hat_valid.reshape(n_stocks, n_time_steps_valid)).T,
       inv_cov_tensor=inv_cov_tensor)
print('Test set information ratio:', ir)
plt.plot(np.cumsum(rets))
plt.title('Test set cumulative portfolio log-returns')
plt.xlabel('days')
plt.ylabel('cumulative portfolio log-returns')
Test set information ratio: 22.910427663560622
Text(0, 0.5, 'cumulative portfolio log-returns')

Eigenvectors of the covariance matrix as encodings

Here we use the eigenvectors of the high-frequency estimates of the integrated covariance matrix averaged over all days of the training set as asset encodings. The true covariance matrix has a low-rank plus block-diagonal plus diagonal structure.

X_train = np.stack((x1_train.reshape(n_ind*npi*n_time_steps_train),
                    x2_train.reshape(n_ind*npi*n_time_steps_train)), 1)

# number of principal components, 0 if all. 
n_pc = 0

evectors = np.linalg.eigh(cov_avg)[1][:, -n_pc:]
# evectors = np.linalg.eigh(u.uncond_cov())[1][:, -n_pc:]

X_train = np.column_stack((X_train,
                           np.repeat(evectors,
                                     n_time_steps_train, 0)))

Y_train = y_train.reshape(n_ind*npi*n_time_steps_train)


ann_input = theano.shared(X_train)
ann_output = theano.shared(Y_train)

# n_hidden = [10, 5]
n_data = X_train.shape[1]


# Initialize random weights between each layer
init_1 = np.random.randn(n_data, n_hidden[0]).astype(theano.config.floatX)
init_2 = np.random.randn(n_hidden[0], n_hidden[1]).astype(theano.config.floatX)
init_out = np.random.randn(n_hidden[1]).astype(theano.config.floatX)
    
with pm.Model() as neural_network:
    weights_in_1 = pm.Normal('w_in_1_grp', 0, sd=1, 
                              shape=(n_data, n_hidden[0]), 
                              testval=init_1)

    weights_1_2 = pm.Normal('w_1_2_grp', 0, sd=1, 
                            shape=(n_hidden[0], n_hidden[1]), 
                            testval=init_2)

    weights_2_out = pm.Normal('w_2_out_grp', 0, sd=1, 
                              shape=(n_hidden[1],), 
                              testval=init_out)

    # Build neural-network using relu activation function
    act_1 = tt.nnet.relu(tt.dot(ann_input, 
                         weights_in_1))
    act_2 = tt.nnet.relu(tt.dot(act_1, 
                         weights_1_2))
    
    # linear output layer
    intercept = pm.Normal('intercept', mu=0, sd=10)
    act_out = tt.dot(act_2, weights_2_out) + intercept
        
    sigma = pm.HalfNormal('sigma', sd=10)
    out = pm.Normal('out', 
                    act_out,
                    sigma,
                    observed=ann_output)
 
with neural_network:
    trace = pm.sample(
                     init='advi+adapt_diag', tune=50, chains=1, n_init=200000,
                     nuts_kwargs={'target_accept': 0.9},)
Auto-assigning NUTS sampler...
Initializing NUTS using advi+adapt_diag...
Average Loss = -6,392.4:  28%|██▊       | 56287/200000 [03:28<08:53, 269.35it/s]
Convergence achieved at 56300
Interrupted at 56,299 [28%]: Average Loss = -1,968.1
Sequential sampling (1 chains in 1 job)
NUTS: [sigma, intercept, w_2_out_grp, w_1_2_grp, w_in_1_grp]
100%|██████████| 550/550 [15:27<00:00,  1.69s/it]
There was 1 divergence after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.6732817678810623, but should be close to 0.9. Try to increase the number of tuning steps.
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
Only one chain was sampled, this makes it impossible to run some convergence checks

Train set performance

ppc = pm.sample_posterior_predictive(trace, model=neural_network, samples=1000)
y_hat = ppc['out'].mean(0).reshape(n_ind, npi*n_time_steps_train)
mse(y_hat, Y_train.reshape(n_ind, npi*n_time_steps_train))    
100%|██████████| 1000/1000 [00:22<00:00, 44.43it/s]
0.0002160451662617625

Test set performance

X_valid = np.stack((x1_valid.reshape(n_ind*npi*n_time_steps_valid),
                    x2_valid.reshape(n_ind*npi*n_time_steps_valid)), 1)

X_valid = np.column_stack((X_valid, np.repeat(evectors, n_time_steps_valid, 0)))
Y_valid = y_valid.reshape(n_ind*npi*n_time_steps_valid)
ann_input.set_value(X_valid)
ann_output.set_value(Y_valid)
ppc = pm.sample_posterior_predictive(trace, model=neural_network, samples=1000)
100%|██████████| 1000/1000 [00:22<00:00, 44.84it/s]
y_hat_valid = ppc['out'].mean(0).reshape(n_ind, npi*n_time_steps_valid)
mse(y_hat_valid, Y_valid.reshape(n_ind, npi*n_time_steps_valid))    
0.00036303203719532354
ir, rets = get_ir(log_rets=torch.tensor(Y_valid.reshape(n_stocks, n_time_steps_valid)).T,
       Y_hat=torch.tensor(y_hat_valid.reshape(n_stocks, n_time_steps_valid)).T,
       inv_cov_tensor=inv_cov_tensor)
print('Test set information ratio:', ir)
plt.plot(np.cumsum(rets))
plt.title('Test set cumulative portfolio log-returns')
plt.xlabel('days')
plt.ylabel('cumulative portfolio log-returns')
Test set information ratio: 20.179354474730747
Text(0, 0.5, 'cumulative portfolio log-returns')

PyTorch deep ensemble

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import optim


class Net(nn.Module):
    def __init__(self, n_neurons, n_features, dropout):
        super().__init__()

        self.fc1 = nn.Linear(n_features, n_neurons)
        self.fc2 = nn.Linear(n_neurons, n_neurons)
        self.fc4 = nn.Linear(n_neurons, n_neurons)
        self.fc5 = nn.Linear(n_neurons, n_neurons)
        self.fc3 = nn.Linear(n_neurons, 1)
        self.fc1bn = nn.BatchNorm1d(n_neurons)
        self.fc2bn = nn.BatchNorm1d(n_neurons)
        self.fc4bn = nn.BatchNorm1d(n_neurons)
        self.fc5bn = nn.BatchNorm1d(n_neurons)
        self.drop_layer = nn.Dropout(p=dropout)

    def forward(self, X):
        X = F.relu(self.fc1bn(self.fc1(X)))
        X = self.drop_layer(X)
        X = F.relu(self.fc2bn(self.fc2(X)))
        X = self.drop_layer(X)
        # X = F.relu(self.fc4bn(self.fc4(X)))
        # X = self.drop_layer(X)
        # X = F.relu(self.fc5bn(self.fc5(X)))
        # X = self.drop_layer(X)

        X = self.fc3(X)
        return X
from tqdm import tqdm

def fit_model(model, epochs, X, Y, X_valid, Y_valid):
    optimizer = optim.Adam(model.parameters(), weight_decay=0., lr=1e-2)
    for epoch in (range(epochs)):
        # trainig mode
        model = model.train()
        model.zero_grad()
        Y_hat = model(X)
        loss = criterion(Y_hat, Y)
        loss.backward()
        optimizer.step()
        if (epoch+1) % 1000 ==0:
          with torch.no_grad():
            model.eval()
            Y_hat = model(X_valid)
            print(f'Epoch: {epoch+1} \t Train loss: {loss} \t Valid loss: {criterion(Y_hat, Y_valid)}')




def predict(model, X):
    with torch.no_grad():
        model.eval()
        Y_hat = model(X)
    return Y_hat
N = 30
n_neurons = 10
dropout = 0.
epochs = 3000
criterion = nn.MSELoss()

# Train set
X_train = np.stack((x1_train.reshape(n_ind*npi*n_time_steps_train),
                    x2_train.reshape(n_ind*npi*n_time_steps_train)), 1)
X_train = np.column_stack((X_train,
                           np.repeat(industry_loadings, n_time_steps_train, 0)))
Y_train = y_train.reshape(n_ind*npi*n_time_steps_train, 1)
X_train = torch.tensor(X_train).float()
Y_train = torch.tensor(Y_train).float()

# Validation set
X_valid = np.stack((x1_valid.reshape(n_ind*npi*n_time_steps_valid),
                    x2_valid.reshape(n_ind*npi*n_time_steps_valid)), 1)
X_valid = np.column_stack((X_valid, np.repeat(industry_loadings, n_time_steps_valid, 0)))
Y_valid = y_valid.reshape(n_ind*npi*n_time_steps_valid, 1)
X_valid = torch.tensor(X_valid).float()
Y_valid = torch.tensor(Y_valid).float()

n_features = X_train.shape[1]

models = [Net(n_neurons, n_features, dropout) for i in range(N)]


for i, model in enumerate(models):
  print(f'Fitting model {i} ...')
  fit_model(model, epochs, X_train, Y_train, X_valid, Y_valid,)
Fitting model 0 ...
Epoch: 1000 	 Train loss: 0.0002508650650270283 	 Valid loss: 0.0004071235307492316
Epoch: 2000 	 Train loss: 0.00023996594245545566 	 Valid loss: 0.00040422912570647895
Epoch: 3000 	 Train loss: 0.00023474314366467297 	 Valid loss: 0.0004130271845497191
Fitting model 1 ...
Epoch: 1000 	 Train loss: 0.0002466226869728416 	 Valid loss: 0.00040071021066978574
Epoch: 2000 	 Train loss: 0.0002341988729313016 	 Valid loss: 0.00039889742038212717
Epoch: 3000 	 Train loss: 0.00023318800958804786 	 Valid loss: 0.0003919414011761546
Fitting model 2 ...
Epoch: 1000 	 Train loss: 0.0002498300455044955 	 Valid loss: 0.0003950036771129817
Epoch: 2000 	 Train loss: 0.00023865590628702193 	 Valid loss: 0.00039687982643954456
Epoch: 3000 	 Train loss: 0.0002277261664858088 	 Valid loss: 0.0004184243152849376
Fitting model 3 ...
Epoch: 1000 	 Train loss: 0.00024302199017256498 	 Valid loss: 0.0004039230407215655
Epoch: 2000 	 Train loss: 0.00023296280414797366 	 Valid loss: 0.00041294313268736005
Epoch: 3000 	 Train loss: 0.00023424768005497754 	 Valid loss: 0.0004649430338758975
Fitting model 4 ...
Epoch: 1000 	 Train loss: 0.00025030976394191384 	 Valid loss: 0.0003966281365137547
Epoch: 2000 	 Train loss: 0.00023791301646269858 	 Valid loss: 0.0004042745567858219
Epoch: 3000 	 Train loss: 0.00023029415751807392 	 Valid loss: 0.00041665148455649614
Fitting model 5 ...
Epoch: 1000 	 Train loss: 0.0002436818613205105 	 Valid loss: 0.0004103767278138548
Epoch: 2000 	 Train loss: 0.0002334853634238243 	 Valid loss: 0.0004120630619581789
Epoch: 3000 	 Train loss: 0.00023018076899461448 	 Valid loss: 0.0004126749117858708
Fitting model 6 ...
Epoch: 1000 	 Train loss: 0.0002556033432483673 	 Valid loss: 0.00040123239159584045
Epoch: 2000 	 Train loss: 0.0002425243437755853 	 Valid loss: 0.0003935922868549824
Epoch: 3000 	 Train loss: 0.00023646869522053748 	 Valid loss: 0.00039417625521309674
Fitting model 7 ...
Epoch: 1000 	 Train loss: 0.0002523782313801348 	 Valid loss: 0.0004040569765493274
Epoch: 2000 	 Train loss: 0.0002402125537628308 	 Valid loss: 0.0004080838116351515
Epoch: 3000 	 Train loss: 0.0002350450085941702 	 Valid loss: 0.00040917948354035616
Fitting model 8 ...
Epoch: 1000 	 Train loss: 0.0002448004961479455 	 Valid loss: 0.00041951227467507124
Epoch: 2000 	 Train loss: 0.000229840210522525 	 Valid loss: 0.00041069864528253675
Epoch: 3000 	 Train loss: 0.00022234210337046534 	 Valid loss: 0.0004027922113891691
Fitting model 9 ...
Epoch: 1000 	 Train loss: 0.0002454790810588747 	 Valid loss: 0.0004230051126796752
Epoch: 2000 	 Train loss: 0.00023656564007978886 	 Valid loss: 0.00041839986806735396
Epoch: 3000 	 Train loss: 0.00023000931832939386 	 Valid loss: 0.0004211821942590177
Fitting model 10 ...
Epoch: 1000 	 Train loss: 0.0002483158023096621 	 Valid loss: 0.0003964881761930883
Epoch: 2000 	 Train loss: 0.000234818464377895 	 Valid loss: 0.0004061934887431562
Epoch: 3000 	 Train loss: 0.000222624687012285 	 Valid loss: 0.0004086146946065128
Fitting model 11 ...
Epoch: 1000 	 Train loss: 0.000249355478445068 	 Valid loss: 0.0004024350200779736
Epoch: 2000 	 Train loss: 0.0002366793341934681 	 Valid loss: 0.0004024521913379431
Epoch: 3000 	 Train loss: 0.00022948744299355894 	 Valid loss: 0.0004035316815134138
Fitting model 12 ...
Epoch: 1000 	 Train loss: 0.00024561697500757873 	 Valid loss: 0.00040083681233227253
Epoch: 2000 	 Train loss: 0.00022838980657979846 	 Valid loss: 0.0004054875753354281
Epoch: 3000 	 Train loss: 0.00022083648946136236 	 Valid loss: 0.00040789719787426293
Fitting model 13 ...
Epoch: 1000 	 Train loss: 0.0002446445287205279 	 Valid loss: 0.0004010601551271975
Epoch: 2000 	 Train loss: 0.00022973056184127927 	 Valid loss: 0.00040554016595706344
Epoch: 3000 	 Train loss: 0.00022411468671634793 	 Valid loss: 0.0003935765125788748
Fitting model 14 ...
Epoch: 1000 	 Train loss: 0.00025146466214209795 	 Valid loss: 0.00040369381895288825
Epoch: 2000 	 Train loss: 0.00024030216445680708 	 Valid loss: 0.0004035647725686431
Epoch: 3000 	 Train loss: 0.00023493324988521636 	 Valid loss: 0.0004059392085764557
Fitting model 15 ...
Epoch: 1000 	 Train loss: 0.00024309262516908348 	 Valid loss: 0.00040126906242221594
Epoch: 2000 	 Train loss: 0.00023285775387194008 	 Valid loss: 0.00039379208465106785
Epoch: 3000 	 Train loss: 0.0002281484194099903 	 Valid loss: 0.0003939683083444834
Fitting model 16 ...
Epoch: 1000 	 Train loss: 0.00024127463984768838 	 Valid loss: 0.00041748175863176584
Epoch: 2000 	 Train loss: 0.00023192162916529924 	 Valid loss: 0.0004089689755346626
Epoch: 3000 	 Train loss: 0.00022744224406778812 	 Valid loss: 0.0004062617663294077
Fitting model 17 ...
Epoch: 1000 	 Train loss: 0.0002443670528009534 	 Valid loss: 0.000404148711822927
Epoch: 2000 	 Train loss: 0.0002300344203831628 	 Valid loss: 0.00040540017653256655
Epoch: 3000 	 Train loss: 0.00022617366630584002 	 Valid loss: 0.0004077464691363275
Fitting model 18 ...
Epoch: 1000 	 Train loss: 0.00023845398391131312 	 Valid loss: 0.00040168772102333605
Epoch: 2000 	 Train loss: 0.0002224644849775359 	 Valid loss: 0.0004150435561314225
Epoch: 3000 	 Train loss: 0.00021296142949722707 	 Valid loss: 0.0004222855204716325
Fitting model 19 ...
Epoch: 1000 	 Train loss: 0.0002498266112525016 	 Valid loss: 0.0003959094174206257
Epoch: 2000 	 Train loss: 0.00023983922437764704 	 Valid loss: 0.00039493574877269566
Epoch: 3000 	 Train loss: 0.00022892268316354603 	 Valid loss: 0.00039925871533341706
Fitting model 20 ...
Epoch: 1000 	 Train loss: 0.00022800166334491223 	 Valid loss: 0.00041681560105644166
Epoch: 2000 	 Train loss: 0.00022075066226534545 	 Valid loss: 0.00041216512909159064
Epoch: 3000 	 Train loss: 0.00021413722424767911 	 Valid loss: 0.00041918197530321777
Fitting model 21 ...
Epoch: 1000 	 Train loss: 0.0002461010590195656 	 Valid loss: 0.0004059797211084515
Epoch: 2000 	 Train loss: 0.00023865287948865443 	 Valid loss: 0.00040679622907191515
Epoch: 3000 	 Train loss: 0.00023225756012834609 	 Valid loss: 0.00041148404125124216
Fitting model 22 ...
Epoch: 1000 	 Train loss: 0.00023736321600154042 	 Valid loss: 0.0003993347054347396
Epoch: 2000 	 Train loss: 0.0002264958166051656 	 Valid loss: 0.0004273262165952474
Epoch: 3000 	 Train loss: 0.00022050924599170685 	 Valid loss: 0.0004051178402733058
Fitting model 23 ...
Epoch: 1000 	 Train loss: 0.0002564026799518615 	 Valid loss: 0.00041236868128180504
Epoch: 2000 	 Train loss: 0.00024284643586724997 	 Valid loss: 0.00040239820373244584
Epoch: 3000 	 Train loss: 0.00023604398302268237 	 Valid loss: 0.0004029260599054396
Fitting model 24 ...
Epoch: 1000 	 Train loss: 0.0002551157376728952 	 Valid loss: 0.0004068294074386358
Epoch: 2000 	 Train loss: 0.00024311809102073312 	 Valid loss: 0.00040164098027162254
Epoch: 3000 	 Train loss: 0.00023520956165157259 	 Valid loss: 0.00040673359762877226
Fitting model 25 ...
Epoch: 1000 	 Train loss: 0.0002518740948289633 	 Valid loss: 0.00042671780101954937
Epoch: 2000 	 Train loss: 0.000242152382270433 	 Valid loss: 0.0004163305275142193
Epoch: 3000 	 Train loss: 0.0002383065439062193 	 Valid loss: 0.00041217298712581396
Fitting model 26 ...
Epoch: 1000 	 Train loss: 0.0002392594760749489 	 Valid loss: 0.00042069697519764304
Epoch: 2000 	 Train loss: 0.0002272079000249505 	 Valid loss: 0.0004235675442032516
Epoch: 3000 	 Train loss: 0.00022260515834204853 	 Valid loss: 0.00044391441042535007
Fitting model 27 ...
Epoch: 1000 	 Train loss: 0.00023777727619744837 	 Valid loss: 0.0004073498130310327
Epoch: 2000 	 Train loss: 0.00022612173052038997 	 Valid loss: 0.00040711450856179
Epoch: 3000 	 Train loss: 0.00021765446581412107 	 Valid loss: 0.00041556329233571887
Fitting model 28 ...
Epoch: 1000 	 Train loss: 0.00025968998670578003 	 Valid loss: 0.00038338295416906476
Epoch: 2000 	 Train loss: 0.00024011768982745707 	 Valid loss: 0.0003876080154441297
Epoch: 3000 	 Train loss: 0.00022824975894764066 	 Valid loss: 0.0003926944627892226
Fitting model 29 ...
Epoch: 1000 	 Train loss: 0.0002526867319829762 	 Valid loss: 0.0004468766273930669
Epoch: 2000 	 Train loss: 0.00022869124950375408 	 Valid loss: 0.0004210940969642252
Epoch: 3000 	 Train loss: 0.0002248857490485534 	 Valid loss: 0.0004204651922918856

Test set performance

# X_valid = np.stack((x1_valid.reshape(n_ind*npi*n_time_steps_valid),
#                     x2_valid.reshape(n_ind*npi*n_time_steps_valid)), 1)

# X_valid = np.column_stack((X_valid, np.repeat(industry_loadings, n_time_steps_valid, 0)))
# Y_valid = y_valid.reshape(n_ind*npi*n_time_steps_valid, 1)


# X_valid = torch.tensor(X_valid).float()
# Y_valid = torch.tensor(Y_valid).float()

y_hat_valid_list = [] 
for model in models:
  y_hat_valid_list.append(predict(model, X_valid).numpy())

y_hat_valid_mean = np.array(y_hat_valid_list).mean(0)


y_hat_valid = y_hat_valid_mean.reshape(n_ind, npi*n_time_steps_valid)
mse(y_hat_valid, Y_valid.numpy().reshape(n_ind, npi*n_time_steps_valid))    
0.0003699056
ir, rets = get_ir(log_rets=torch.tensor(Y_valid.reshape(n_stocks, n_time_steps_valid)).T,
       Y_hat=torch.tensor(y_hat_valid.reshape(n_stocks, n_time_steps_valid)).T,
       inv_cov_tensor=inv_cov_tensor.float())
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  """Entry point for launching an IPython kernel.
print('Test set information ratio:', ir)
plt.plot(np.cumsum(rets))
plt.title('Test set cumulative portfolio log-returns')
plt.xlabel('days')
plt.ylabel('cumulative portfolio log-returns')
Test set information ratio: 20.05791717754258
Text(0, 0.5, 'cumulative portfolio log-returns')

Conclusion

Hierarchical neural networks can successfully model panel data of asset returns with sector dependent data generating functions by sharing information via informative priors. Still, standard neural networks are surprisingly good at capturing the sector differences as well if the sector is given as a one-hot encoded matrix.

CPython 3.6.9
IPython 5.5.0

numpy 1.18.5
numba 0.48.0
hfhd 0.1.4
scipy 1.4.1
theano 1.0.5
pymc3 3.7
matplotlib 3.2.2

compiler   : GCC 8.4.0
system     : Linux
release    : 4.19.112+
machine    : x86_64
processor  : x86_64
CPU cores  : 2
interpreter: 64bit